learning non-convergent non-persistent short-run mcmc
Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generating synthesized examples, we then update the model parameters according to the maximum likelihood learning gradient, as if the synthesized examples are fair samples from the current model. We treat this non-convergent short-run MCMC as a learned generator model or a flow model. We provide arguments for treating the learned non-convergent short-run MCMC as a valid model. We show that the learned short-run MCMC is capable of generating realistic images. More interestingly, unlike traditional EBM or MCMC, the learned short-run MCMC is capable of reconstructing observed images and interpolating between images, like generator or flow models. The code can be found in the Appendix.
On Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
Reply to Reviewer 2: Thank you for the insightful and comprehensive summary of our work. We will add such information in revision. We can still learn short-run MCMC successfully. We shall also try to implement Burda et al. (thanks for the reference). Following your advice, we did 1D and 2D experiments.
Reviews: Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
The highlighted phenomenon (the convergence of a short-run MCMC while training EBMs) seems to be novel and very interesting. The conventional wisdom is that a simple MCMC algorithm like Langevin dynamics would take a long time to converge close to the stationary distribution of the EBM when initialized far from it. The paper argues that in fact if the EBM is trained by generating negative samples from a short-run MCMC, then the short-run MCMC chain would in fact converge close to the data distribution (the authors argue that the "closeness" is related to moment matching). The theoretical argument for explaining this phenomenon seems suggestive, but ultimately didn't convince the reviewer (even convergence of the algorithm seems to be not explained, and section 4.2 seems particularly weak - it's not clear what the "generalized moment matching objective" is trying to achieve). However the empirical evidence for the convergence of short-run MCMC in EBMs seems very compelling - the training procedure for the model is significantly simpler than other procedures used to train EBMs, yet produces highly competitive results on several image datasets.
Reviews: Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
A new energy-based generative model for images is proposed. The paper suggests to run Langevin dynamics in the data domain to create artificial samples, and updating the model parameters based on these synthesized images in an'analysis by synthesis' framework. The generative model allows for unconditional generation and interpolation. It is interesting that short-run MCMC can be used in this context despite not being converged. The effect of the hyperparameter K (number of MCMC steps) could have been more explored.
Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generating synthesized examples, we then update the model parameters according to the maximum likelihood learning gradient, as if the synthesized examples are fair samples from the current model. We treat this non-convergent short-run MCMC as a learned generator model or a flow model. We provide arguments for treating the learned non-convergent short-run MCMC as a valid model.
Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
Nijkamp, Erik, Hill, Mitch, Zhu, Song-Chun, Wu, Ying Nian
This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generating synthesized examples, we then update the model parameters according to the maximum likelihood learning gradient, as if the synthesized examples are fair samples from the current model. We treat this non-convergent short-run MCMC as a learned generator model or a flow model. We provide arguments for treating the learned non-convergent short-run MCMC as a valid model.